Back

JMIR Public Health and Surveillance

JMIR Publications Inc.

All preprints, ranked by how well they match JMIR Public Health and Surveillance's content profile, based on 45 papers previously published here. The average preprint has a 0.11% match score for this journal, so anything above that is already an above-average fit. Older preprints may already have been published elsewhere.

1
A Comprehensive Statistical Analysis of COVID-19 Trends: Global and U.S. Insights through ARIMA, Regression, and Spatial Models

LEI, Z.

2024-10-23 public and global health 10.1101/2024.10.22.24315932 medRxiv
Top 0.1%
22.0%
Show abstract

The COVID-19 pandemic has driven the need for accurate data analysis and forecasting to guide public health decisions. In this study, we utilized ARIMA and ARIMAX models to predict short-term trends in confirmed COVID-19 cases across different regions, including the United States, Asia, Europe, Africa, and the Americas. Comparisons were made between ARIMA and auto.arima models, and anomaly detection was performed to investigate discrepancies between predictions and actual data. The study also explored the relationship between vaccination rates and new case numbers, and examined how socioeconomic factors such as GDP per capita, HDI, and healthcare resources influenced COVID-19 incidence rates across countries. Our findings provide insights into the effectiveness of predictive models and the significant impact of socioeconomic factors on the spread of the virus, contributing valuable information for future epidemic prevention and control strategies.

2
Number of tests required to flatten the curve of coronavirus disease-2019

Hwang, S.; Kim, J.-H.; Choe, Y. J.; Oh, D.-h.

2021-01-02 public and global health 10.1101/2020.12.26.20248818 medRxiv
Top 0.1%
22.0%
Show abstract

We developed a mathematical model to quantify the number of tests required to stop the spread of coronavirus disease 2019 (COVID-19). Our model analyses performed using the data from the U.S. suggest that the infection coefficient increases by approximately 47% upon relaxing the lockdown policy. To offset the effect of lockdown relaxation, the number of tests should increase by 2.25 times, corresponding to approximately 280,000-360,000 tests per day in April 2020.

3
A Simple Mathematical Model for Estimating the Inflection Points of COVID-19 Outbreaks

Ma, Z.

2020-03-27 health informatics 10.1101/2020.03.25.20043893 medRxiv
Top 0.1%
18.9%
Show abstract

BackgroundExponential-like infection growths leading to peaks (which could be the inflection points or turning points) are usually the hallmarks of infectious disease outbreaks including coronaviruses. To predict the inflection points, i.e., inflection time (Tmax) & maximal infection number (Imax) of the novel coronavirus (COVID-19), we adopted a trial and error strategy and explored a series of approaches from simple logistic modeling (that has an asymptomatic line) to sophisticated tipping point detection techniques for detecting phase transitions but failed to obtain satisfactory results. MethodInspired by its success in diversity-time relationship (DTR), we apply the PLEC (power law with exponential cutoff) model for detecting the inflection points of COVID-19 outbreaks. The model was previously used to extend the classic species-time relationship (STR) for general DTR (Ma 2018), and it has two "secondary" parameters (computed from its 3 parameters including power law scaling parameter w, taper-off parameter d to overwhelm virtually exponential growth ultimately, and a parameter c related to initial infections): one that was originally used for estimating the potential or dark biodiversity is proposed to estimate the maximal infection number (Imax) and another is proposed to determine the corresponding inflection time point (Tmax). ResultsWe successfully estimated the inflection points [Imax, Tmax] for most provinces ({approx}85%) in China with error rates <5% in both Imax and Tmax. We also discussed the constraints and limitations of the proposed approach, including (i) sensitive to disruptive jumps, (ii) requiring sufficiently long datasets, and (iii) limited to unimodal outbreaks.

4
A Combined Predictive and Causal Approach for Neighborhood-Level Diabetes Detection

Noaeen, M.; Rostami, A.; Ghanem, I.; Saarela, O.; Keshavjee, K.; Brook, J. R.; Shakeri, Z.

2025-03-05 endocrinology 10.1101/2025.02.28.25323125 medRxiv
Top 0.1%
18.6%
Show abstract

ObjectiveDevelop a neighborhood-level framework using machine learning and causal inference to identify socioeconomic and behavioral drivers of Type 2 diabetes for targeted public health interventions. Materials and MethodsData from 1,149 Census Tracts in Toronto were integrated, linking demographic, health, and marginalization indices. Seven machine learning models classified neighborhoods with high diabetes prevalence. Feature engineering mitigated skewness and correlation, while Causal Forests estimated the Conditional Average Treatment Effect (CATE,{tau} ) for predictors such as work stress, smoking, and mental health. ResultsPredictive models achieved over 90% recall and high AUC metrics on both test and external validation datasets. Key predictors included obesity, overweight status, physical activity, and log-transformed median age. Causal analysis further indicated that elevated work stress ({tau} = 0.312) and daily smoking ({tau} = 0.155) increased diabetes risk, while stronger mental health ({tau} {approx} -1.1) was protective. DiscussionWhile genetic and clinical factors often dominate the conversation on diabetes, data is often restricted to confirmed diagnoses or not readily available for prevalence analyses. Our study shows how neighborhood contexts, including walkability, stress levels, and socioeconomic differences, help drive rising disease rates. We integrated machine learning classifiers with causal inference to examine how interventions, such as active transportation and adjusted work stress, could shift diabetes risk. ConclusionThis integrated method offers a blueprint for precision public health by clarifying how modifiable neighborhood factors affect diabetes risk. It can help tailor interventions to community needs and is applicable to other areas facing similar chronic disease challenges.

5
Understanding Public Attitudes Towards Human Papillomavirus Vaccination in Japan: Insights from Social Media Stance Analysis Using Large Language Models

Niu, Q.; Liu, J.

2024-10-07 health informatics 10.1101/2024.10.07.24315018 medRxiv
Top 0.1%
18.5%
Show abstract

BackgroundDespite the reinstatement of proactive human papillomavirus (HPV) vaccine recommendations in 2022, Japan continues to face persistently low HPV vaccination rates, posing significant public health challenges. Misinformation, complacency, and accessibility issues have been identified as key factors undermining vaccine uptake. ObjectiveThis study aims to understand how factors such as misinformation, public health events, and attitudes toward other vaccines, like COVID-19, influence HPV vaccine hesitancy, by analyzing the evolution of public attitudes towards HPV vaccination in Japan by examining social media content. MethodsWe collected tweets related to HPV vaccine from 2011 to 2021. Traditional natural language processing (NLP) methods and large language models (LLMs) was utilized to perform stance analysis on collected data. The analysis included stance identification, time series analysis, topic modeling, and logic analysis. We framed our findings within the context of the WHOs 3Cs model--Confidence, Complacency, and Convenience. ResultsPublic confidence in the HPV vaccine fluctuated in response to government policies and media events, with misinformation playing a critical role in eroding trust. Complacency increased following the suspension of recommendations in 2013 but decreased as advocacy resumed in 2020. Accessibility (Convenience) was also found to be a key determinant of vaccination uptake. HPV vaccines are often used as supportive evidence towards other vaccines, such as COVID-19. ConclusionsOur findings underscore the importance of targeted public health interventions to restore and maintain vaccine confidence in Japan. While vaccine confidence has shown a slow increase, sustained efforts are necessary to secure long-term improvements. Confidence in one vaccine may positively influence perceptions of other vaccines. Addressing misinformation, reducing complacency, and enhancing vaccine accessibility are key strategies to improve uptake. Increased confidence in HPV vaccines appeared to have a positive influence on confidence in other vaccines, such as COVID-19. This study also demonstrates the utility of LLMs in offering a deeper understanding of public health attitudes. To effectively combat vaccine hesitancy and improve coverage, interventions must prioritize consistent communication, localized strategies, and an integrated approach to vaccine narratives.

6
Using Twitter Data Analysis to Understand the Perceptions, Awareness, and Barriers to the Wide Use of Pre-Exposure Prophylaxis in the United States

Erdengsileng, A.; Tian, S.; Green, S. S.; Naar, S.; He, Z.

2022-12-20 health informatics 10.1101/2022.12.19.22283677 medRxiv
Top 0.1%
18.4%
Show abstract

User-generated social media posts such as tweets can provide insights about the publics perception, cognitive, and behavioral responses to health-related issues. Pre-Exposure Prophylaxis (PrEP) is one of the most effective ways to reduce the risk of HIV infection. However, its utilization is low in the US, especially among populations disproportionately affected by HIV such as the age group of under 24 years old. It is therefore important to understand the barriers to the wider use of PrEP in the US using social media posts. In this study, we collected tweets from Twitter about PrEP in the past 4 years to identify such barriers by first identifying tweets about personal discussions, and then performing textual analysis using word analysis, UMLS semantic type analysis, and topic modeling. We found that the public often discussed advocacy, risks/benefits, access, pricing, insurance coverage, legislation, stigma, health education, and prevention of HIV. This result is consistent with the literature and can help identify strategies for promoting the use of PrEP, especially among young adults.

7
Creation of a Clinical Decision-Support Tool for Assigning Occupational Disability to United States Air Force Personnel

Uptegraft, C. C.; Witkop, C. T.

2020-05-11 health informatics 10.1101/2020.05.07.20090530 medRxiv
Top 0.1%
18.4%
Show abstract

Occupational dispositions (profiles) are the top reason active duty service members are not medically ready to deploy or fulfill their job responsibilities. An audit across multiple U.S. Air Force (AF) medical treatment facilities revealed significant shortcomings in how medical providers assign profiles. We aimed to create a predictive model and a decision-support tool that estimates profile duration. Using retrospective profiles (n=1,546,805) from the Aeromedical Services Information Management System between 1 Feb 2007 and 31 Jan 2017, we built and validated a decision-support tool that estimates profile length. Multivariate quantile regressions (n=2,575) were performed across five quantiles and six levels of diagnostic specificity for every diagnostic code with 2,100 or more observations. The models universally estimated profile duration with very poor accuracy (pseudoR2 0.000 to 0.168); however, predictive ability was directly correlated with quantile level with minimal variation by diagnostic specificity. Age, O4 to O6+ ranks, very heavy job class, and co-morbid conditions were all significant in more than 25.0% of regressions down all levels of diagnostic specificity. Age, co-morbid conditions, E7-E9 ranks, O4 to O6+ ranks, and light job class all added days to profile duration while E1 to E4 ranks, heavy, and very heavy job class subtracted days. While this study failed to produce an accurate tool, several findings, the indirect correlation between profile duration and very heavy job class and the assignment of durations based on convenient calendar times, warrant further investigation. For now, providers may consult existing decision-support tools when building profiles for AF service members, heeding attention that they were built with non-representative civilian populations. DisclaimerThe views expressed are solely those of the authors and do not reflect the official policy or position of the US Army, US Navy, US Air Force, the Department of Defense, or the US Government.

8
Ensemble Forecasts of Coronavirus Disease 2019 (COVID-19) in the U.S.

Ray, E. L.; Wattanachit, N.; Niemi, J.; Kanji, A. H.; House, K.; Cramer, E. Y.; Bracher, J.; Zheng, A.; Yamana, T. K.; Xiong, X.; Woody, S.; Wang, Y.; Wang, L.; Walraven, R. L.; Tomar, V.; Sherratt, K.; Sheldon, D.; Reiner, R. C.; Prakash, B. A.; Osthus, D.; Li, M. L.; Lee, E. C.; Koyluoglu, U.; Keskinocak, P.; Gu, Y.; Gu, Q.; George, G. E.; Espana, G.; Corsetti, S.; Chhatwal, J.; Cavany, S.; Biegel, H.; Ben-Nun, M.; Walker, J.; Slayton, R.; Lopez, V.; Biggerstaff, M.; Johansson, M. A.; Reich, N. G.; COVID-19 Forecast Hub Consortium,

2020-08-22 epidemiology 10.1101/2020.08.19.20177493 medRxiv
Top 0.1%
18.3%
Show abstract

BackgroundThe COVID-19 pandemic has driven demand for forecasts to guide policy and planning. Previous research has suggested that combining forecasts from multiple models into a single "ensemble" forecast can increase the robustness of forecasts. Here we evaluate the real-time application of an open, collaborative ensemble to forecast deaths attributable to COVID-19 in the U.S. MethodsBeginning on April 13, 2020, we collected and combined one- to four-week ahead forecasts of cumulative deaths for U.S. jurisdictions in standardized, probabilistic formats to generate real-time, publicly available ensemble forecasts. We evaluated the point prediction accuracy and calibration of these forecasts compared to reported deaths. ResultsAnalysis of 2,512 ensemble forecasts made April 27 to July 20 with outcomes observed in the weeks ending May 23 through July 25, 2020 revealed precise short-term forecasts, with accuracy deteriorating at longer prediction horizons of up to four weeks. At all prediction horizons, the prediction intervals were well calibrated with 92-96% of observations falling within the rounded 95% prediction intervals. ConclusionsThis analysis demonstrates that real-time, publicly available ensemble forecasts issued in April-July 2020 provided robust short-term predictions of reported COVID-19 deaths in the United States. With the ongoing need for forecasts of impacts and resource needs for the COVID-19 response, the results underscore the importance of combining multiple probabilistic models and assessing forecast skill at different prediction horizons. Careful development, assessment, and communication of ensemble forecasts can provide reliable insight to public health decision makers.

9
Title: COVID-19Predict - Predicting Pandemic Trends.

Bosch, J.; Wilson, A.; O'Neil, K.; Zimmerman, P. A.

2020-09-11 public and global health 10.1101/2020.09.09.20191593 medRxiv
Top 0.1%
18.3%
Show abstract

BackgroundGiven the global public health importance of the COVID-19 pandemic, data comparisons that predict on-going infection and mortality trends across national, state and county-level administrative jurisdictions are vitally important. We have designed a COVID-19 dashboard with the goal of providing concise sets of summarized data presentations to simplify interpretation of basic statistics and location-specific current and short-term future risks of infection. MethodsWe perform continuous collection and analyses of publicly available data accessible through the COVID-19 dashboard hosted at Johns Hopkins University (JHU github). Additionally, we utilize the accumulation of cases and deaths to provide dynamic 7-day short-term predictions on these outcomes across these national, state and county administrative levels. FindingsCOVID-19Predict produces 2,100 daily predictions [or calculations] on the state level (50 States x3 models x7 days x2 cases and deaths) and 131,964 (3,142 Counties x3 models x7 days x2 cases and deaths) on the county level. To assess how robust our models have performed in making short-term predictions over the course of the pandemic, we used available case data for all 50 U.S. states spanning the period January 20 - August 16 2020 in a retrospective analysis. Results showed a 3.7% to -0.2% mean error of deviation from the actual case predictions to date. InterpretationOur transparent methods and admin-level visualizations provide real-time data reporting and forecasts related to on-going COVID-19 transmission allowing viewers (individuals, health care providers, public health practitioners and policy makers) to develop their own perspectives and expectations regarding public life activity decisions. FundingFinancial resources for this study have been provided by Case Western Reserve University.

10
Diabetes related emergency department visits among adults during the COVID-19 pandemic - an analysis of data from the New York City syndromic surveillance system

Zhilkova, A.; Lall, R.; Mathes, R.; Chamany, S.; Olson, D.

2023-12-21 public and global health 10.1101/2023.12.20.23300289 medRxiv
Top 0.1%
18.0%
Show abstract

ObjectiveNew York City was an early epicenter of the COVID-19 pandemic. We aim to describe population level epidemiological trends in diabetes related emergency department (ED) visits among adults in New York City, for the period prior to and encompassing the first four waves of the pandemic. Research Design and MethodsWe used data from the New York City ED syndromic surveillance system during December 30, 2018 through May 21, 2022. This system captures all visits from EDs in the city in near-real time. We matched these visits to laboratory confirmed COVID-19 positivity data beginning with February 15, 2020. ResultsCompared to pre-pandemic baseline levels, diabetes related ED visits noticeably increased during the first wave in spring 2020, though this did not necessarily translate to net increases overall during that period. Visits for diabetic ketoacidosis, particularly among adults with type 2 diabetes, sharply increased before returning to pre-pandemic levels, most notably during wave 1 and wave 4 in winter 2021-2022. Trajectories of diabetes-related ED visits differed by diabetes type, age, and sex. Some ED visit trends did not return to pre-pandemic baseline levels. ConclusionsThe COVID-19 pandemic, especially the first wave in the spring of 2020, coincided with a dramatic shift in diabetes related ED utilization in New York City. Our findings highlight the importance of on-going surveillance of health care utilization for chronic diseases during population-level emergencies like pandemics. A robust syndromic surveillance system that includes infectious and non-infectious syndromes is useful to better prepare, mitigate, and respond to population-level events. Article HighlightsO_LIDiabetes related emergency department (ED) visits in New York City increased dramatically with the emergence of the COVID-19 pandemic in spring 2020. C_LIO_LIThe trajectory of diabetes-related ED visits differed by diabetes type, age, sex, and pandemic wave. C_LIO_LIThe diabetes complication of diabetic ketoacidosis among adults with type 2 diabetes showed sharp increases in the first and fourth waves of the pandemic, respectively its initial emergence in spring 2020 and the Omicron variant in winter 2021-2022. C_LIO_LIOur findings highlight the importance of on-going surveillance of health care utilization for chronic diseases during population-level emergencies like pandemics. C_LI SummaryData from NYCs syndromic surveillance system showed major increases in #type2diabetes complications (e.g. diabetic ketoacidosis) during #COVID-19 waves 1 and 4 (Omicron) - this tool may be useful for population-level monitoring of chronic disease complications during emergencies

11
Estimating the Growth Rate and Doubling Time for Short-Term Prediction and Monitoring Trend During the COVID-19 Pandemic with a SAS Macro

Xu, S.; Clarke, C.; Shetterly, S.; Narwaney, K.

2020-04-11 public and global health 10.1101/2020.04.08.20057943 medRxiv
Top 0.1%
17.8%
Show abstract

Coronavirus disease (COVID-19) has spread around the world causing tremendous stress to the US health care system. Knowing the trend of the COVID-19 pandemic is critical for the federal and local governments and health care system to prepare plans. Our aim was to develop an approach and create a SAS macro to estimate the growth rate and doubling time in days if growth rate is positive or half time in days if growth rate is negative. We fit a series of growth curves using a rolling approach. This approach was applied to the hospitalization data of Colorado State during March 13th and April 13th. The growth rate was 0.18 (95% CI=(0.11, 0.24)) and the doubling time was 5 days (95% CI= (4, 7)) for the period of March 13th-March 19th; the growth rate reached to the minimum -0.19 (95% CI= (-0.29, -0.10)) and the half time was 4 days (95% CI= (2, 6)) for the period of April 2nd - April 8th. This approach can be used for regional short-term prediction and monitoring the regional trend of the COVID-19 pandemic.

12
Google Searches for Taste and Smell Loss Anticipate Covid-19 Epidemiology

Lippi, G.; Henry, B. M.; Mattiuzzi, C.; Sanchis-Gomar, F.

2020-11-12 public and global health 10.1101/2020.11.09.20228510 medRxiv
Top 0.1%
17.7%
Show abstract

BackgroundAs evidence emerged that loss of taste and/or loss of smell is frequently triggered by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection, we investigated whether Google searches volume for these two disease-specific symptoms could be associated with disease epidemiology in United States (US). Materials and MethodsWe performed an electronic search in Google Trends using the keywords "taste loss" and "smell loss" within the US. The Google searches volume was correlated with the number of new weekly cases of coronavirus disease 2019 (COVID-19) in the country. ResultsThe weekly Google searches for taste and smell loss exhibited a trend similar to that of new weekly SARS-CoV-2 infections in the US. A nearly perfect correlation was found between Google Trends scores of taste and smell loss (r=0.98; 95% CI, 0.97-0.99; p<0.001). Although a significant association was found between Google searches for the two symptoms and the concomitant number of new weekly SARS-CoV-2 infections reported during the same week, the correlation improved over time. The highest correlation was found comparing Google Trends scores for taste or smell loss and the number of new weekly SARS-CoV-2 infections two weeks later. The correlation coefficient of summing Google Trends scores for the two symptoms and the number of new weekly SARS-CoV-2 infections two weeks later was 0.82 (95% CI, 0.68-0.90; p<0.001), and was associated 0.89 diagnostic accuracy. ConclusionsThese findings suggest that Google searches numbers for olfactory and gustatory dysfunctions may help predicting the epidemiological trajectory of COVID-19 early before official reporting.

13
A New, Simple Projection Model for COVID-19 Pandemic

Lu, J.

2020-03-24 epidemiology 10.1101/2020.03.21.20039867 medRxiv
Top 0.1%
17.6%
Show abstract

BackgroundWith the worldwide outbreak of COVID-19, an accurate model to predict how the coronavirus pandemic will evolve in individual countries becomes important and urgent. Our goal is to provide a prediction model to help policy makers in different countries address the epidemic outbreak and adjust the control policies to contain the spread of the severe acute respiratory syndrome coronavirus 2 (SARS-Cov-2) more effectively. MethodsUnlike the classic public health and virus propagation models, this new projection model takes both government intervention and public response into account to generate reliable projections of the outbreak 10 days to 2 weeks in advance. This method is an observation based projection similar than the classic Moores Law in miroelectronics. The Moores law is not based on any physics law and yet has anticipated the development of microelectronics for decades. This work is an empirical relation to decribe the evolution of epidemic to pandemic situations in different countries. The country was selected as an observation unit because the regulation and political decision is an national decision for numerous measures such as the implementation of social distancing, the quarantine of suspected cases, and the closing of borders to achieve territorial containment. FindingsThis model has been successfully applied to predict the evolution of pendemic situation in China. Then the model was also validated by the South Korean data. With a reduction of cases calculated as reduction coefficient of the increase rate of daily cases Rc = 2% per day, we observed a very efficient policy with a strict systematic control in both China and South Korea. For the moment, the Canada, USA, Australia may have difficulties to limit the fast evolution of the epidemic. With a Rc<0.5%, its particularly important for the USA to consider escalating the control measures because the affected cases can reach more than one million very soon. InterpertationDue to the difference of national disciplines and historical culture, the national policy may be implemented and observed with different efficiency. The starting point where the government decided to apply total containment can also play a key role for the evolution of the pendemic situation. The model will allow each national government of the nations still affected by the pandemic to project the situation for the coming 10 to 14 days. Its very important for the deployment of national and international efforts to stop the pandemic situation. FundingNational Key R&D Program of China (Ministry of Science & Technology (MOST, China))

14
Reassessing the First Year of COVID-19: Estimating Infections and Tracking Pandemic Trends with Probabilistic Bias Analysis

Harkare, H. V.

2025-11-22 infectious diseases 10.1101/2025.11.21.25338950 medRxiv
Top 0.1%
17.1%
Show abstract

BackgroundThe novel SARS-CoV-2 virus, first identified in China in December 2019, rapidly spread worldwide, resulting in 114 million confirmed cases and 2.5 million reported deaths by February 28, 2021. Yet considerable uncertainty persists regarding the true scale of infections, as limited testing capacity and inconsistent surveillance substantially hindered accurate case detection. Robust estimates of infection burden are essential for understanding the pandemics full impact and guiding effective public health responses. MethodsThis study aimed to quantify the health burden of COVID-19 in India, Mexico, the United Kingdom, and the United States. A probabilistic bias analysis model is used to estimate the true number of SARS-CoV-2 infections, accounting for factors such as repeated testing, infection waves, and viral mutations to provide a more accurate assessment of the pandemics true scale in each country. ResultsBy February 28, 2021, the estimated total COVID-19 infections across India, Mexico, the United Kingdom, and the United States reached 286.7 million - more than six times the reported cases. India had the highest estimated infections (129.3 million), followed by the United States (98.6 million), Mexico (44.9 million), and the United Kingdom (14 million). Detection rates varied significantly, with Mexico underreporting infections by a factor of 22, while the United Kingdom and the United States had the highest detection rates. Testing capacity played a key role, with high-income countries conducting over four times more tests per 1,000 people than lower-income nations. ConclusionThis study demonstrating substantial underreporting of SARS-CoV-2 during the pre-vaccination period. These retrospective estimates provide a more accurate historical baseline for interpreting pandemic dynamics and remain valuable for assessing long-term health impacts and improving preparedness for future epidemics.

15
COVID-19: Easing the coronavirus lockdowns with caution

Alabi, R. O.; Siemuri, A.; Elmusrati, M.

2020-05-14 health informatics 10.1101/2020.05.10.20097295 medRxiv
Top 0.1%
17.1%
Show abstract

BackgroundThe spread of the novel severe acute respiratory syndrome coronavirus (SARS-CoV-2) has reached a global level, creating a pandemic. The government of various countries, their citizens, politicians, and business owners are worried about the unavoidable economic impacts of this pandemic. Therefore, there is an eagerness for the pandemic peaking. ObjectivesThis study uses an objective approach to emphasize the need to be pragmatic with easing of lockdowns measures worldwide through the forecast of the possible trend of COVID-19. This is necessary to ensure that the enthusiasm about SARS-CoV-2 peaking is properly examined, easing of lockdown is done systematically to avoid second-wave of the pandemic. MethodsWe used the Facebook prophet on the World Health Organization data for COVID-19 to forecast the spread of SARS-CoV-2 for the 7th April until 3rd May 2020. The forecast model was further used to forecast the trend of the virus for the 8th until 14th May 2020. We presented the forecast of the confirmed and death cases. ResultsOur findings from the forecast showed an increase in the number of new cases for this period. Therefore, the need for easing the lockdown with caution becomes imperative. Our model showed good performance when compared to the official report from the World Health Organization. The average forecasting accuracy of our model was 79.6%. ConclusionAlthough, the global and economic impact of COVID-19 is daunting. However, excessive optimism about easing the lockdown should be appropriately weighed against the risk of underestimating its spread. As seen globally, the risks appeared far from being symmetric. Therefore, the forecasting provided in this study offers an insight into the spread of the virus for effective planning and decision-making in terms of easing the lockdowns in various countries.

16
Using ICD-10-based Social Determinants of Health Categories to Assess Patients Risk for Acute Care Utilization

Nguyen, P. H.; Wang, J.; Garcia-Filion, P.; Dominick, D.; Abbaszadegan, H.; Gonzalez Hernandez, G.; O'Connor, K.; Rehman, S. U.; Panchanathan, S. S.

2021-01-15 health informatics 10.1101/2021.01.14.21249618 medRxiv
Top 0.1%
16.9%
Show abstract

BackgroundThere has been an increasing recognition of the influence of social, behavioral, economic, and environmental factors on overall patient health. The purpose of this project was to leverage the ICD-10 codes to identify and link social determinants of health (SDoH) to patients with a high probability of utilizing acute care services and to determine if social service intervention reduced care utilization. MethodsWe analyzed retrospective data for active patients at a Department of Veterans Affairs Medical Center (VAMC) from 2015-2017. Eleven categories of SDoH were developed based on existing literature of the social determinants; the relevant ICD-10 codes were divided among these categories. Emergency Room (ER) visits, hospital admissions, and social work visits were determined for each patient in the cohort. ResultsIn a cohort of 44,401 patients, the presence of ICD-10 codes within the EHR in the 11 SDoH categories was positively correlated with increased acute care utilization. Veterans with at least one SDoH risk factor were 71% (95%CI: 68% - 75%) more likely to use the ED and 71% (95%CI: 65%-77%) more likely to be admitted to the hospital. Utilization decreased with social service interventions. ConclusionThis project demonstrates a potentially meaningful method to capture patient social risk profiles through existing EHR data in the form of ICD-10 codes, which can be used to identify the highest risk patients for intervention with the understanding that not all SDoH codes are uniformly used and some SDoHs may not be captured.

17
Assessing the accuracy of California county level COVID-19 hospitalization forecasts to inform public policy decision making

White, L. A.; McCorvie, R.; Crow, D.; Jain, S.; Leon, T. M.

2022-11-10 public and global health 10.1101/2022.11.08.22282086 medRxiv
Top 0.1%
16.9%
Show abstract

BackgroundThe COVID-19 pandemic has highlighted the role of infectious disease forecasting in informing public policy. However, significant barriers remain for effectively linking infectious disease forecasts to public health decision making, including a lack of model validation. Forecasting model performance and accuracy should be evaluated retrospectively to understand under which conditions models were reliable and could be improved in the future. MethodsUsing archived forecasts from the California Department of Public Healths California COVID Assessment Tool (https://calcat.covid19.ca.gov/cacovidmodels/), we compared how well different forecasting models predicted COVID-19 hospitalization census across California counties and regions during periods of Alpha, Delta, and Omicron variant predominance. ResultsBased on mean absolute error estimates, forecasting models had variable performance across counties and through time. When accounting for model availability across counties and dates, some individual models performed consistently better than the ensemble model, but model rankings still differed across counties. Local transmission trends, variant prevalence, and county population size were informative predictors for determining which model performed best for a given county based on a random forest classification analysis. Overall, the ensemble model performed worse in less populous counties, in part because of fewer model contributors in these locations. ConclusionsEnsemble model predictions could be improved by incorporating geographic heterogeneity in model coverage and performance. Consistency in model reporting and improved model validation can strengthen the role of infectious disease forecasting in real-time public health decision making.

18
Predictive model for real-world performance of COVID-19 antigen tests based on laboratory evaluation

Bosch, M. E.; Garcia, D.; Rudtner, L.; Salcedo, N.; Colmenares, R.; Hoche, S.; Arocha, J. G.; Hall, D.; Moreno, A.; Bosch, I.

2024-10-22 health informatics 10.1101/2024.10.21.24315762 medRxiv
Top 0.1%
15.2%
Show abstract

Controlling spread of disease due to infectious agents require a quick response from public health sector. In the ongoing COVID-19 pandemic, the use of antigen tests has shown to be an excellent tool to inform authorities and mitigate the spread of the disease. In this communication we demonstrated how performance of an antigen test -- as a diagnostic in vitro device -- can be properly validated using quantitative laboratory experimentation and self-testing data from a clinical study. We also show how clinical performance of an antigen test can be predicted using mathematical modeling. The proposed appraisal methodology of antigen test performance under real-world conditions could be a useful tool to inform regulatory decision making. This approach allows to standardize, democratize, and speed up the process of validation, analysis, and comparison of antigen rapid tests, and thus to help developing effective public health response strategies.

19
Estimating the scale of COVID-19 Epidemic in the United States: Simulations Based on Air Traffic directly from Wuhan, China

Li, D.; Lv, J.; Botwin, G.; Braun, J.; Cao, W.; Li, L.; McGovern, D. P. B.

2020-03-08 epidemiology 10.1101/2020.03.06.20031880 medRxiv
Top 0.1%
14.7%
Show abstract

IntroductionCoronavirus Disease 2019 (COVID-19) infection has been characterized by rapid spread and unusually large case clusters. It is important to have an estimate of the current state of COVID-19 epidemic in the U.S. to help develop informed public health strategies. MethodsWe estimated the potential scale of the COVID-19 epidemic (as of 03/01/2020) in the U.S. from cases imported directly from Wuhan area. We used simulations based on transmission dynamics parameters estimated from previous studies and air traffic data from Wuhan to the U.S and deliberately built our model based on conservative assumptions. Detection and quarantine of individual COVID-19 cases in the U.S before 03/01/2020 were also taken into account. A SEIR model was used to simulate the growth of the number of infected individuals in Wuhan area and in the U.S. ResultsWith the most likely model, we estimated that there would be 9,484 infected cases (90%CI 2,054-24,241) as of 03/01/2020 if no successful intervention procedure had been taken to reduce the transmissibility in unidentified cases. Assuming current preventive procedures have reduced 25% of the transmissibility in unidentified cases, the number of infected cases would be 1,043 (90%CI 107-2,474). ConclusionOur research indicates that, as of 03/01/2020., it is likely that there are already thousands of individuals in the US infected with SARS-CoV-2. Our model is dynamic and is available to the research community to further evaluate as the situation becomes clearer.

20
AI-Driven Feature Selection Using Only Survey Variable Descriptions: Large Language Models Identify Adolescent Vaping Predictors

Zhang, K.; Zhao, Z.; Hu, Y.; Le, T.

2026-03-09 health informatics 10.64898/2026.03.06.26347816 medRxiv
Top 0.1%
14.7%
Show abstract

ObjectiveTo evaluate the effectiveness of various Large Language Models (LLMs) in identifying reliable predictors of Electronic Nicotine Delivery Systems (ENDS) initiation among adolescents, using solely large-scale survey variable descriptions. MethodsA cohort of 7,943 tobacco-naive adolescents aged 12-16 years from the Population Assessment of Tobacco and Health (PATH) Study was analyzed to predict ENDS use at wave 5. Four instruction-tuned LLMs - GPT-4o, LLaMA 3.1-70B, Qwen 2.5-72B-Instruct, and DeepSeek-V3 - were systematically evaluated for text-based feature selection using only variable descriptions from wave 4.5. Selected features were used to train LightGBM classifiers, with model performance compared to a baseline. ResultsOur findings reveal notable consistency among the four instruction-tuned LLMs, with substantial overlap in the top predictors each model identified. These selected variables spanned critical domains such as peer and household influence, risk perception, and exposure to tobacco-related cues. LightGBM classifiers trained on PATH wave 4.5-5 data using features selected by the LLMs demonstrated strong predictive performance. Notably, Qwen 2.5-72B-Instruct achieved an AUC of 0.791 with 30 predictors, surpassing the baseline AUC of 0.768. DiscussionThe substantial overlap among the top predictors identified by different LLMs suggests a shared reasoning process, despite variations in model architecture and training. LightGBM classifiers trained on these LLM-selected features achieved performance comparable to, or exceeding, models trained on the full set of survey variables, underscoring the high quality of features selected solely from textual descriptions. Moreover, these findings are consistent with previous tobacco regulatory research, further validating the effectiveness of LLM-driven feature selection. ConclusionInstruction-tuned large language models can effectively perform text-based feature selection using survey variable descriptions alone, without accessing raw survey data. This scalable, interpretable, and privacy-preserving framework holds promise for behavioral health research and tobacco use surveillance.